AITopics | review quality

Collaborating Authors

review quality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Counterfactual Evaluation of Peer-Review Assignment Policies

Neural Information Processing SystemsFeb-16-2026, 17:59:41 GMT

As a result, these and other large conferences must rely on automated systems to decide what members of the impaneled reviewer pool will review each paper.

artificial intelligence, assignment, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Counterfactual Evaluation of Peer-Review Assignment Policies

Neural Information Processing SystemsDec-26-2025, 15:15:08 GMT

Peer review assignment algorithms aim to match research papers to suitable expert reviewers, working to maximize the quality of the resulting reviews. A key challenge in designing effective assignment policies is evaluating how changes to the assignment algorithm map to changes in review quality. In this work, we leverage recently proposed policies that introduce randomness in peer-review assignment--in order to mitigate fraud--as a valuable opportunity to evaluate counterfactual assignment policies. Specifically, we exploit how such randomized assignments provide a positive probability of observing the reviews of many assignment policies of interest. To address challenges in applying standard off-policy evaluation methods, such as violations of positivity, we introduce novel methods for partial identification based on monotonicity and Lipschitz smoothness assumptions for the mapping between reviewer-paper covariates and outcomes. We apply our methods to peer-review data from two computer science venues: the TPDP'21 workshop (95 papers and 35 reviewers) and the AAAI'22 conference (8,450 papers and 3,145 reviewers). We consider estimates of (i) the effect on review quality when changing weights in the assignment algorithm, e.g., weighting reviewers' bids vs. textual similarity (between the review's past papers and the submission), and (ii) the cost of randomization, capturing the difference in expected quality between the perturbed and unperturbed optimal match. We find that placing higher weight on text similarity results in higher review quality and that introducing randomization in the reviewer-paper assignment only marginally reduces the review quality. Our methods for partial identification may be of independent interest, while our off-policy approach can likely find use in evaluating a broad class of algorithmic matching systems.

artificial intelligence, proceedings, review quality, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Counterfactual Evaluation of Peer-Review Assignment Policies

Neural Information Processing SystemsOct-9-2025, 05:40:36 GMT

As a result, these and other large conferences must rely on automated systems to decide what members of the impaneled reviewer pool will review each paper.

artificial intelligence, assignment, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)

Add feedback

ReviewRL: Towards Automated Scientific Review with RL

Zeng, Sihang, Tian, Kai, Zhang, Kaiyan, wang, Yuru, Gao, Junqi, Liu, Runze, Yang, Sa, Li, Jingxuan, Long, Xinwei, Ma, Jiaheng, Qi, Biqing, Zhou, Bowen

arXiv.org Artificial IntelligenceAug-15-2025

Peer review is essential for scientific progress but faces growing challenges due to increasing submission volumes and reviewer fatigue. Existing automated review approaches struggle with factual accuracy, rating consistency, and analytical depth, often generating superficial or generic feedback lacking the insights characteristic of high-quality human reviews. We introduce ReviewRL, a reinforcement learning framework for generating comprehensive and factually grounded scientific paper reviews. Our approach combines: (1) an ArXiv-MCP retrieval-augmented context generation pipeline that incorporates relevant scientific literature, (2) supervised fine-tuning that establishes foundational reviewing capabilities, and (3) a reinforcement learning procedure with a composite reward function that jointly enhances review quality and rating accuracy. Experiments on ICLR 2025 papers demonstrate that ReviewRL significantly outperforms existing methods across both rule-based metrics and model-based quality assessments. ReviewRL establishes a foundational framework for RL-driven automatic critique generation in scientific discovery, demonstrating promising potential for future development in this domain. The implementation of ReviewRL will be released at GitHub.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.10308

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(2 more...)

Add feedback

Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards

Kim, Jaeho, Lee, Yunseok, Lee, Seulki

arXiv.org Artificial IntelligenceMay-9-2025

The peer review process in major artificial intelligence (AI) conferences faces unprecedented challenges with the surge of paper submissions (exceeding 10,000 submissions per venue), accompanied by growing concerns over review quality and reviewer responsibility. This position paper argues for the need to transform the traditional one-way review system into a bi-directional feedback loop where authors evaluate review quality and reviewers earn formal accreditation, creating an accountability framework that promotes a sustainable, high-quality peer review system. The current review system can be viewed as an interaction between three parties: the authors, reviewers, and system (i.e., conference), where we posit that all three parties share responsibility for the current problems. However, issues with authors can only be addressed through policy enforcement and detection tools, and ethical concerns can only be corrected through self-reflection. As such, this paper focuses on reforming reviewer accountability with systematic rewards through two key mechanisms: (1) a two-stage bi-directional review system that allows authors to evaluate reviews while minimizing retaliatory behavior, (2)a systematic reviewer reward system that incentivizes quality reviewing. We ask for the community's strong interest in these problems and the reforms that are needed to enhance the peer review process.

data mining, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2505.04966

Country:

Asia > South Korea > Ulsan > Ulsan (0.04)
North America > Canada (0.04)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
(4 more...)

Add feedback

CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells

Naik, Atharva, Alenius, Marcus, Fried, Daniel, Rose, Carolyn

arXiv.org Artificial IntelligenceSep-29-2024

The task of automated code review has recently gained a lot of attention from the machine learning community. However, current review comment evaluation metrics rely on comparisons with a human-written reference for a given code change (also called a diff), even though code review is a one-to-many problem like generation and summarization with many "valid reviews" for a diff. To tackle these issues we develop a CRScore - a reference-free metric to measure dimensions of review quality like conciseness, comprehensiveness, and relevance. We design CRScore to evaluate reviews in a way that is grounded in claims and potential issues detected in the code by LLMs and static analyzers. We demonstrate that CRScore can produce valid, fine-grained scores of review quality that have the greatest alignment with human judgment (0.54 Spearman correlation) and are more sensitive than reference-based metrics. We also release a corpus of 2.6k human-annotated review quality scores for machine-generated and GitHub review comments to support the development of automated metrics.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.19801

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Dominican Republic (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Evaluating the "Learning on Graphs" Conference Experience

Rieck, Bastian, Coupette, Corinna

arXiv.org Artificial IntelligenceJun-1-2023

With machine learning conferences growing ever larger, and reviewing processes becoming increasingly elaborate, more data-driven insights into their workings are required. In this report, we present the results of a survey accompanying the first "Learning on Graphs" (LoG) Conference. The survey was directed to evaluate the submission and review process from different perspectives, including authors, reviewers, and area chairs alike. The first "Learning on Graphs" (LoG) Conference (9-12 December, 2022) was remarkable in more ways than one: starting from scratch, the conference aims to be the place for graph learning research, making use of an advisory committee that consists of international experts in the field. Moreover, at is core, LoG wants to be known for its exceptional review quality.

artificial intelligence, machine learning, respondent, (16 more...)

arXiv.org Artificial Intelligence

2306.00586

Genre:

Research Report (0.64)
Questionnaire & Opinion Survey (0.46)
Personal (0.46)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

Avoiding a Tragedy of the Commons in the Peer Review Process

Sculley, D, Snoek, Jasper, Wiltschko, Alex

arXiv.org Machine LearningDec-18-2018

Peer review is the foundation of scientific publication, and the task of reviewing has long been seen as a cornerstone of professional service. However, the massive growth in the field of machine learning has put this community benefit under stress, threatening both the sustainability of an effective review process and the overall progress of the field. In this position paper, we argue that a tragedy of the commons outcome may be avoided by emphasizing the professional aspects of this service. In particular, we propose a rubric to hold reviewers to an objective standard for review quality. In turn, we also propose that reviewers be given appropriate incentive. As one possible such incentive, we explore the idea of financial compensation on a per-review basis. We suggest reasonable funding models and thoughts on long term effects.

artificial intelligence, machine learning, reviewer, (16 more...)

arXiv.org Machine Learning

1901.06246

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

Simulation Study on a New Peer Review Approach

Steppi, Albert, Qu, Jinchan, Tao, Minjing, Zhao, Tingting, Pang, Xiaodong, Zhang, Jinfeng

arXiv.org Artificial IntelligenceJun-11-2018

The increasing volume of scientific publications and grant proposals has generated an unprecedentedly high workload to scientific communities. Consequently, review quality has been decreasing and review outcomes have become less correlated with the real merits of the papers and proposals. A novel distributed peer review (DPR) approach has recently been proposed to address these issues. The new approach assigns principal investigators (PIs) who submitted proposals (or papers) to the same program as reviewers. Each PI reviews and ranks a small number (such as seven) of other PIs' proposals. The individual rankings are then used to estimate a global ranking of all proposals using the Modified Borda Count (MBC). In this study, we perform simulation studies to investigate several parameters important for the decision making when adopting this new approach. We also propose a new method called Concordance Index-based Global Ranking (CIGR) to estimate global ranking from individual rankings. An efficient simulated annealing algorithm is designed to search the optimal Concordance Index (CI). Moreover, we design a new balanced review assignment procedure, which can result in significantly better performance for both MBC and CIGR methods. We found that CIGR performs better than MBC when the review quality is relatively high. As review quality and review difficulty are tightly correlated, we constructed a boundary in the space of review quality vs review difficulty that separates the CIGR-superior and MBC-superior regions. Finally, we propose a multi-stage DPR strategy based on CIGR, which has the potential to substantially improve the overall review performance while reducing the review workload.

artificial intelligence, machine learning, proposal, (18 more...)

arXiv.org Artificial Intelligence

1806.08663

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Florida > Leon County > Tallahassee (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback